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In this paper, an experiment has been carried out based on a simple k-nearest 
neighbor (KNN) classifier to investigate the capabilities of three extracted 
facial features for the better recognition of facial emotions. The feature 
extraction techniques used are histogram of oriented gradient (HOG), Gabor, 
and local binary pattern (LBP). A comparison has been made using 
performance indices such as average recognition accuracy, overall 
recognition accuracy, precision, recall, kappa coefficient, and computation 
time. Two databases, i.e., Cohn-Kanade (CK+) and Japanese female facial 
expression (JAFFE) have been used here. Different training to testing data 
division ratios is explored to find out the best one from the performance 
point of view of the three extracted features, Gabor produced 94.8%, which 
is the best among all in terms of average accuracy though the computational 
time required is the highest. LBP showed 88.2% average accuracy with a 


expression computational time less than that of Gabor while HOG showed minimum 
average accuracy of 55.2% with the lowest computation time. 
This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 

Among many modalities of human affective states, the facial expression remains a significant mode 
of communicating an individual's state of mind. Facial expression accounts for 55% of the entire emotional 
information as compared to 38% by discourse, and 7% by language [1], [2]. Among these modalities, the 
recognition of emotions using facial expression (RFE) remains a complex domain of research due to the 
absence of standard best features adequately describing these states. It remains significant in the area of 
human-machine interaction and design acknowledgment [3]-[5]. 

There are two major approaches to facial emotion recognition as appearance and geometric-based 
model [6], [7]. However, the techniques based on geometric models do not consider the skin surface 
adjustments such as the significant wrinkles displaying the outward appearance. On the contrary, appearance- 
based techniques utilize the whole face or unequivocal zones in the facial image to represent the shrouded 
information [8]-[10]. 

In this regard, the Gabor Filter is an appropriate strategy to recognize human expressive states with 
promising results earlier. The technique is suitable for extracting information on multi-scale, multi-course 
changes in an expressive facial surface while not disturbing the changes in brightness. It targets the 
prominent features of emotion by focusing on the variation in the edge and texture of an image [11], [12]. On 
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the contrary, the histogram of oriented gradient (HOG) process develops the histogram corresponding to each 
cell comprising several pixels by estimating the luminance gradient of each pixel. It is a geometric-based 
approach in which, the luminance gradient utilizes all the adjacent pixels such as the top, bottom, left, and 
right, to compute the magnitude and the direction of the variation in color intensity of a cell. The main 
properties of local binary pattern (LBP) are obstruction against brilliance changes and their computational 
ease [13]. However, the applicability of the HOG and LBP technique in RFE as compared to the Gabor filter 
under different training and testing data, orientation and Kappa coefficients can provide new insights to 
researchers, thus investigated here [14]. 

Classification algorithms play an important role in the identification of facial expressions (FE). 
Earlier literature in RFE has explored several classification mechanisms such as random forest, naive Bayes, 
support vector machine (SVM), Hidden Markov model (HMM), AdaBoost, multilayer neural networks, 
decision tree, K-nearest neighbors, and deep neural networks with excellent results [13]—[15]. Nevertheless, 
reliable, comprehensive, and faster classification algorithms are often chosen which should address the 
challenges of subject-dependency, variation in illumination, and the position of the head during the affective 
states [16]. 

Here Section 2 investigates the chosen feature extraction techniques in detail. Section 3 briefs the 
choice of the database whereas the reason for choosing the k-nearest neighbor (KNN) classifier has been 
provided in section 4. The simulation results using the chosen classifier and the extracted feature sets have 
been explained in section 5 and lastly, section 6 concludes the work with future directions. 


2. FEATURE EXTRACTION METHOD 

The facial image identification modeling is shown in Figure 1. It comprises several components 
meant for image acquisition, pre-processing, feature extraction, and classification. After clicking an FE 
image using a camera, it is pre-processed to minimize any variation due to the environment and other 
sources. The pre-processing step involves image-scaling, adjustment of contrast and brightness, and image 
enhancement. As the facial images of the chosen Japanese female facial expression (JAFEE) and Cohn- 
Kanade (CK+) database have already been pre-processed, it is not required to involve this step here. This 
work explores the Gabor filter, LBP, and HOG. Feature extraction techniques to classify the FE states using 
facial images. The feature extraction techniques have been briefly explained in the following subsections. 


. Feature kNN Output 
Extraction Classifier Emotions 


Facial Scaling, HOG Angry, 
Expressive Brightness Gabor Disgust, 
Image and contrast Fear, 

d . 
- adjustment, Happy 2 
Enhancement Sad. 
Surprise 


Figure 1. The facial image identification modeling 


2.1. Histogram of oriented gradients 

HOG technique is considered here as it focuses on both local and global facial expression attributes 
in different orientations and scales. The features are sensitive to variations in the shape of an object unless the 
shape is consistent [17]. This piece of work utilizes nine bin histograms representing the directions and 
strength of an edge using 4x4 cells corresponding to each patch. These features of each active facial patch 
are appended to extract the desired feature vector [18]. 

For the pixel z(s, t), the gradient is computed in the HOG approaches, 


Gy = z(s —1,t) — 2(s + 1,t) (1) 
Gq = 2(s,t —1) —2(s,t + 1) (2) 


The gradient magnitude is given by, 
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G= Ie, +6," (3) 


The orientation of the bin is given by, 
6 = arctan(G,/G,) (4) 


Where @ denotes the bin angle. Both the magnitude and the bin angle are used to form the HOG 
feature vector. In this work, the size of each HOG cell is fixed at 8 x 16 pixels. This way, it is possible to 
focus on the variation of the shape of the eyes, mouth, and eyebrows that change more vertically during an 
emotional outburst. To choose the cell size, we begin with 2x2 pixels to 64x64 pixels using all the possible 
variations in both vertical and horizontal dimensions and noting the RFE accuracy. The cell size of 8x16 has 
provided the highest accuracy, and hence is kept for further processing. It is observed that with an increase in 
cell size, there is a loss of image details, and the computation time increases. On the contrary, the feature 
vector dimension remains small and the computation time becomes faster with smaller-sized cells. 


2.2. Local binary pattern 

LBP is a very popular, efficient, and simple texture descriptor that is used for many computer vision 
problems [19]. It can capture the spatial pattern along with the grayscale contrast using a simple thresholding 
technique, where the intensities of the neighboring pixels are compared with that of the center pixel resulting 
in a binary pattern termed LBP [20]. The basic LBP operation with a 3 xX 3 window is expressed and 
demonstrated. 


LBP (Xe, Yc) = pees =—1)2" (5) 


Where i, corresponds to the intensity of the central pixel (%,, y,), i, corresponds to the gray values of the 
eight closed pixels, and if i,, — i,> 0, then s(i, —i,) = 1, else s(i,, — i.) =0. 
The mathematical form is donated as, 


LBPPR = YF 9 S(9; — 9c) 2! (6) 


where the gray value of the jth pixel is g; and the gray value of the ith pixel is g, respectively, S(x) is a unit 
step function defined. 


_(1, if(@@>0) 
= té if (x <0) 


The multi-goal examination can be accomplished by picking various estimations of R and P. 
Figure 2 shows three diverse sweeps of LBP administrators. From left to right, they are LBP/?, LBP3?, 
and LBP? operators respectively. 


(P=4,.R=1) (P=8,R=1) (P=8,R=2) 


Figure 2. An example of a basic LBP operation 


After applying the LBP operator to an image, the histogram is calculated, 
T= Sey Ay) =) i= tae 1 (7) 
Here n= different labels and, 


1,A is True 
I(A) = lou is False 
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2.3. Gabor filters 

Gabor filter is a linear filter and is described by the spatial and frequency domain representation of 
the signal. It can provide important information on emotions as the filter can approximate the human's 
perception adequately [21]. It can be expressed as a combination of the complex exponential function and the 
2D Gaussian function. 


2 2 
f(a,b) = exp (- at) exp (i (202 + )) (8) 
Where a, = acos@ + bsin@ and b, = —asin@ + bcos@. Here 6, A, y,o,and y denotes the orientation in 


degrees, wavelength, phase offsets, standard deviation, and the spatial aspect ratio respectively. Using the 
real component of (8), the expression for the Gabor filter becomes, 


ay?+y7b1” 
f(a,b) = exp (- a) cos (20% + 9) (9) 
this work develops the Gabor filters using a 39 X 39 size pixel window. Earlier researchers in this direction 
have employed approximately seven or eight different values of 9 and four to five different values of A. 


However, for our purpose, three different values of A = {3, 8,13} and four different values 6 = (5557) 


have been chosen after a few iterations while keeping the parameters y = 0.5, o = 0.56/,and mp = Oas 
constant [22]. The input image / is convolved with Gabor filter f to extract the Gabor features F for a 
specific 8 and A. 


Frag =1* fro (10) 


3. PROPOSED METHOD 
3.1. Japanese female facial expression (JAFFE) database 

The JAFFE database is easily accessible and has been chosen by several researchers in the RFE, 
which makes the comparison platform uniform, hence considered here. The images are stored on a grayscale 
with a resolution of 256 x 256. The happy, disgust, fear, angry, neutral, sad, and surprising emotional 
expression samples from the JAFFE database has been provided in Figure 3. We have considered 188 images 


consisting of six basic emotions in this work. 
ase Ea 
po 


Figure 3. Sample images of the JAFFE database 


3.2. CK+ database 

The extended CK+ information base contains outward appearances of 123 college students. In the 
information base, we chose 928 picture groupings from 123 subjects, with | to 6 feelings for every subject. 
There are 928 images comprising 135 anger, 207 joy, 84 sad, 249 surprises, 75 fears, and 177-disgust FEs. 
Figure 4 provides the sample images of CK+ emotional expressive states. 


3.3. Classification 


kKNN is a non-parametric supervised learning algorithm meant for classification as well as 
regression. It relies on the concept of feature similarity to classify new data meaning. In this, the new data 
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will be assigned a class based on how closely it matches the data in the training set [23]. It allocates the 


feature variable to the designated class based on a distance measure such as the Euclidean norm. For vectors 
DP = (Dy) Do ver vee ve Pm) and q = (4,2 w+ + m)> the distance norm can be expressed as, 


d(p,q) = Dame? ~4;) (11) 


Figure 4. Sample images of CK+ emotional expressive states 


4. RESULTS AND DISCUSSIONS 

The kNN classifier has been utilized to order the extracted feature sets into six different basic 
emotions. Different training and testing data division ratios such as 70%/30%, 60%/40%, 50%/50%, 
40%/60%, and 30%/70% have been trialed from the chosen JAFFE and CK+ database to access the best 
possible recognition accuracy with the classifier. A data division ratio of 70%/30% has provided the desired 
level of accuracy and hence retained for this work. Figure 5 compares kNN accuracy using the extracted 
feature sets with different data division ratios for JAFEE and CK+ Dataset. Figure 5 (a) shows the kNN 
accuracy for the JAFEE dataset whereas Figure 5 (b) shows the accuracy for the CK+ dataset. 
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Figure 5. The comparison of KNN accuracy using the extracted feature sets with different data division ratios 
for, (a) JAFEE dataset and (b) CK+ dataset 


Training and testing were carried out on three sets of features, i.e, HOG, LBP, and Gabor with a 
KNN classifier separately. The performance of the classifier was found on each feature set independent of the 
others with CK+ as well as the JAFFE database. The feature potential can be measured indirectly from the 
execution of the classifier as far as average recognition accuracy, overall accuracy, precision, recall and 
kappa coefficient. All these can be calculated from the confusion matrix, which reflects the number of 
correctly identified facial emotions along the diagonal. Sample confusion matrixes displaying the classifier 
performance with HOG and LBP features are displayed in Figure 6 for the CK+ database. Figure 6 (a) 
provides the kNN confusion matrix using the HOG feature vector, whereas Figure 6 (b) shows the confusion 
matrix using the LBP vector. Similarly, Figure 7 (a) displays the confusion matrix using Gabor features for 
the CK+ database whereas Figure 7 (b) shows the matrix for the JAFFE database. The confusion matrices 
have been computed for the KNN classifier for the JAFFE dataset with the HOG vector in Figure 8 (a) and 
LBP features in Figure 8 (b). 
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Figure 6. The testing confusion matrix using kNN classifier for CK+ dataset using, (a) HOG feature set and 
(b) LBP feature set 


Confusion Matrix 
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Figure 7. The testing confusion matrix using kKNN classifier with Gabor feature set for, (a) CK+ dataset and 
(b) JAFFE dataset 
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Figure 8. The testing confusion matrix using kNN classifier for JAFFE dataset using, (a) HOG feature set and 
(b) LBP feature set 
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The proposed schemes have been implemented on an intel ® core ™ i3-2330M CPU, 220 GHz 
laptop with 4GB RAM, and 64-bit OS using MATLAB R2018b. The various performance parameters used in 
this paper are defined. From this, we will better discriminate the features. 

a. Overall Accuracy — it is the ratio of the number of correctly classified individuals to the total number of 
individuals tested. 


TruePositive + TrueNegative 


~ TruePositive+ TrueNegative+ FalsePositive+FalseNegative 


b. Average Accuracy — Average accuracy can be written as the sum of accuracies of each class divided by the 
total number of the available classes present. 
c. Precision — Precision is given as, 


TruePositive 


Precision = -_ = 
TruePositive+ FalsePositive 


d. Recall — it is given as, 


TruePositive 
Recall = 


TruePositive+ FalseNegative 


e. Kappa Coefficient (K) — Kappa coefficient is a statistic used to measure the agreement between two or 
more observers [24]. The value of K < 0 conveys no unity, the gain lying 0—0.20 indicates low unity, the 
gain value lying 0.21—0.40 conveys good unity, the gain lying 0.41—0.60 gives moderate unity, the gain lying 
0.61—0.80 gives substantial unity, and the gain lying 0.81—1 gives almost best unity [25]. 
f. Testing Time — Total time required for testing samples. Table 1 and Table 2 show the recognition 
accuracies of individual emotions for kKNN based on three different feature schemes for CK+ and JAFFE 
datasets respectively. The surprise state has shown the highest accuracy using KNN+HOG and kNN+LBP as 
observed in Table 1. However, the fear and happy states have outperformed other chosen states using the 
Gabor+kNN for both the datasets when Table 1 and Table 2 are compared. 

Table 3 and Table 4 display all the performance parameters of the three schemes used for the CK+ 
and JAFFE databases respectively. From these Tables, it is clear that the Gabor feature yields around 94% 
recognition accuracy which is the best among all and may be attributed to the image-enhancing capability of 
the Gabor transform, thus making the task easier for the classifier. The recognition accuracy of HOG based 
scheme is the lowest due to its limited structural information while the LBP-based scheme falls in between. It 
can also be observed that computation time has been highest for the scheme based on the Gabor feature 
because of its multi-resolution capability whereas it has been lowest for the HOG-based scheme. 


Table 1. The percentage recognition accuracy of Table 2. The percentage recognition accuracy of 
individual emotion for the CK+ database individual emotion for the JAFEE database 
Emotions kNN+HOG kNN+LBP kNN+Gabor Emotion kNN+HOG kNN+LBP kNN+Gabor 
Anger 87.5 92.5 92.5 Anger 33.33 88.88 100 
Happy 83.9 94.4 100 Sad 44.44 55.55 88.88 
Disgust 84.9 68.1 73.9 Happy 60.00 90.00 100 
Sad 60.0 88.7 93.5 Disgust 12.50 87.50 87.50 
Fear 73.9 84.0 100 Surprise 66.66 771.77 100 
Surprise 94.7 98.6 96.0 Fear 55.55 Tadd 71.71 


Table 3. Performance comparison of three different feature extractors for CK+ database 
Feature Overall accuracy Average accuracy Precision _ Recall _ Kappa Coefficient___ Computation Time in sec. 


HOG 84.53 80.81 0.80 0.82 0.80 2.1 
LBP 90.65 90.31 0.90 0.92 0.88 2.2 
Gabor 94.24 92.66 0.92 0.94 0.92 4.0 


Table 4. Performance comparison of three different feature extractors for JAFFE database 
Feature Overall accuracy Average accuracy Precision _Recall__ Kappa Coefficient ___ Computation Time in sec. 


HOG 46.29 45.41 0.45 0.47 0.35 1.6 
LBP 79.62 79.58 0.79 0.81 0.75 2.1 
Gabor 92.59 92.36 0.92 0.94 0.91 4.5 
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5. CONCLUSION 

This paper is an outcome of a survey conducted on three prominent feature extraction techniques 
used in PC vision and image processing issues for the task of emotion recognition from FE image datasets. 
The extracted feature sets from the JAFFE and CK+ datasets have been used to simulate the simple kNN 
classifier due to its ease of implementation and faster response. The application of the Gabor filter to binary 
images enhances the image to the desired standard, thus making the emotional models reliable and simple. 
Though there exist several challenges in the RFE system, a tremendous scope still exists. These developed 
models can be utilized effectively in automated teller machine (ATMs), identifying fake voters, passports, 
visas and driving licenses. It can also be applied in defense, identifying students in competitive exams as well 
as in private and government sectors. It can be inferred that the multi-resolution Gabor filters remain 
computationally expensive as compared to simple filters such as HOG and LBP, however, it has an improved 
recognition accuracy. The result can be extended in the future to other efficient feature extraction techniques 
that can describe facial expressive states adequately. 
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